Model-based clustering for multivariate partial ranking data

نویسندگان

  • Julien Jacques
  • Christophe Biernacki
  • Julien JACQUES
  • Christophe BIERNACKI
چکیده

This paper proposes the first model-based clustering algorithm dedicated to multivariate partial ranking data. This is an extension of the Insertion Sorting Rank (isr) model for ranking data, which is a meaningful and effective model obtained by modelling the ranking generating process assumed to be a sorting algorithm. The heterogeneity of the rank population is modelled by a mixture of isr, whereas conditional independence assumption allows the extension to multivariate ranking. Maximum likelihood estimation is performed through a SEM-Gibbs algorithm, and partial rankings are considered as missing data, what allows to simulate them during the estimation process. After having validated the estimation algorithm on simulations, three real datasets are studied: the 1980 American Psychological Association (APA) presidential election votes, the results of French students to a general knowledge test and the votes of the European countries to the Eurovision song contest. For each application, the proposed model shows relevant adequacy and leads to significant interpretation. In particular, regional alliances between European countries are exhibited in the Eurovision contest, which are often suspected but never proved. Key-words: Multivariate ranking, partial ranking, mixture model, Insertion Sort Rank, SEM algorithm, Gibbs sampling ∗ [email protected][email protected] Classification automatique de données de rang multivariées incomplètes Résumé : Nous proposons le premier modèle de classification automatique pour données de rang multivariées potentiellement incomplètes. Ce modèle est une extension du modèle Insertion Sorting Rank (isr) pour données de rang, qui est un modèle efficace et signifiant obtenu en modélisant le processus de génération des données. L’hétérogénéité des données est traitée à l’aide d’un modèle de mélange, tandis qu’une hypothèse classique d’indépendance conditionnelle permet de prendre en compte les rangs multivariés. L’estimation des paramètres du modèle est réalisée par maximum de vraisemblance à l’aide d’un algorithme SEM-Gibbs. Les données incomplètes sont considérées comme des données manquantes, ce qui permet de les simuler durant le processus d’estimation. Après avoir validé la stratégie d’estimation sur données simulées, trois jeux de données ont été étudiés : les votes lors de l’élection du président de l’American Psychological Association de 1980, les résultats d’étudiants français lors d’un test de culture générale, et les votes des pays lors du concours de l’Eurovision. Pour chaque application, le modèle proposé a montré une très bonne qualité d’ajustement et à conduit à des interprétations intéressantes. Notamment, pour le concours de l’Eurovision, nous avons mis à jour des alliances géographiques entre pays voisins, ce qui a souvent été suspecté pour ce concours mais jamais prouvé. Mots-clés : Données de rang multivariées, rangs partiels, modèle de mélange, tri par insertion, algorithme SEM, échantillonneur de Gibbs clustering multivariate partial rankings 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rankcluster: An R Package for clustering multivariate partial ranking

Rankcluster is the first R package dedicated to ranking data. This package proposes modelling and clustering tools for ranking data, potentially multivariate and partial. Ranking data are modelled by the Insertion Sorting Rank (isr) model, which is a meaningful model parametrized by a central ranking and a dispersion parameter. A conditional independence assumption allows to take into account m...

متن کامل

Rankcluster: An R package for clustering multivariate partial rankings

Rankcluster is the first R package proposing both modelling and clustering tools for ranking data, potentially multivariate and partial. Ranking data are modelled by the Insertion Sorting Rank (isr) model, which is a meaningful model parametrized by a central ranking and a dispersion parameter. A conditional independence assumption allows to take into account multivariate rankings, and clusteri...

متن کامل

Clustering and Ranking University Majors using Data Mining and AHP algorithms: The case of Iran

Abstract: Although all university majors are prominent and the necessity of their presences is of no question, they might not have the same priority basis considering different resources and strategies that could be spotted for a country. This paper focuses on clustering and ranking university majors in Iran. To do so, a model is presented to clarify the procedure. Eight different criteria are ...

متن کامل

Multivariate Estimation of Rock Mass Characteristics Respect to Depth Using ANFIS Based Subtractive Clustering- Khorramabad- Polezal Freeway Tunnels

Combination of Adoptive Network based Fuzzy Inference System (ANFIS) and subtractive clustering (SC) has been used for estimation of deformation modulus (Em) and rock mass strength (UCSm) considering depth of measurement. To do this, learning of the ANFIS based subtractive clustering (ANFISBSC) was performed firstly on 125 measurements of 9 variables such as rock mass strength (UCSm), deformati...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012